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By Yanrong Yang and Guangming Pan^ 

Monash University and Nanyang Technological University 

This paper proposes a new statistic to test independence between 
two high dimensional random vectors X:pi x 1 and Y:p 2 x 1. The 
proposed statistic is based on the sum of regularized sample canoni¬ 
cal correlation coefficients of X and Y. The asymptotic distribution 
of the statistic under the null hypothesis is established as a corol¬ 
lary of general central limit theorems (CLT) for the linear statistics 
of classical and regularized sample canonical correlation coefficients 
when Pi and p 2 are both comparable to the sample size n. As applica¬ 
tions of the developed independence test, various types of dependent 
structures, such as factor models, ARCH models and a general uncor¬ 
related but dependent case, etc., are investigated by simulations. As 
an empirical application, cross-sectional dependence of daily stock re¬ 
turns of companies between different sections in the New York Stock 
Exchange (NYSE) is detected by the proposed test. 


1. Introduction. A prominent feature of data collection nowadays is that 
the number of variables is comparable with the sample size. This type of 
data poses great challenges because traditional multivariate approaches do 
not necessarily work, which were established for the case of the sample size 
n tending to infinity and the dimension p remaining fixed (see [1]). There 
have been a substantial body of research work dealing with high dimensional 
data, for example, [2, 4, 7, 9, 10, 12], etc. 

The importance of the independence assumption for inference arises in 
many aspects of multivariate analysis. For example, it is often the case in 
multivariate analysis that a number of variables can be rationally classified 
into several mutually exclusive categories. When variables can be grouped 
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in such a way, a natural question is whether there is any significant rela¬ 
tionship between the groups of variables. In other words, can we claim that 
the groups are mutually independent so that further statistical analysis such 
as classification and testing hypothesis of equality of mean vectors and co- 
variance matrices could be conducted? When the dimension p is fixed, [20] 
used the likelihood ratio statistic to test independence for k sets of normal 
distributed random variables and one may also refer to Chapter 12 of [1] re¬ 
garding to this point. Relying on the asymptotic theory of sample canonical 
correlation coefficients, this paper proposes a new statistic to test indepen¬ 
dence between two high dimensional random vectors. 

Specifically, the aim is to test the hypothesis 


Ho : X and y are independent; against 
Hi: X and y are dependent. 


where x = (xi,... and y = (yi,... ■ Without loss of generality, 

suppose that pi <P 2 - 

It is well known that canonical correlation analysis (CCA) deals with the 
correlation structure between two random vectors (see Chapter 12 of [1]). 
Draw n independent and identically distributed (i.i.d.) observations from 
these two random vectors x and y, respectively, and group them into pix n 
random matrix X = (xi,..., x„) = (Ajj)p^xn and p 2 Xn random matrix Y = 
(yi,...,y„) = {Yij)p^xn, respectively. CCA seeks the linear combinations 
a^x and b^y that are most highly correlated, that is, to maximize 


( 1 . 2 ) 


7 = Corr(a'^x, b'^y) 


a^Sxyb 


where Sxx and Xyy are the population covariance matrices for x and y, 
respectively, and Sxy is the population covariance matrix between x and y. 
After finding the maximal correlation ri and associated vectors ai and bi, 
CCA continues to seek a second linear combination a|^x and b^y that has 
the maximal correlation among all linear combinations uncorrelated with 
a^x and b^y. This procedure can be iterated and successive canonical cor¬ 
relation coefficients 71 ,... ,’jp-^ can be found. 

It turns out that the population canonical correlation coefficients 71 ,..., 7 ^^ 
can be recast as the roots of the determinant equation 


(1.3) det(XxySyySxy - 7 ^Sxx) = 0. 

Regarding this point, one may refer to page 284 of [15]. The roots of the de¬ 
terminant equation above go under many names, because they figure equally 
in discriminant analysis, canonical correlation analysis and invariant tests 
of linear hypotheses in the multivariate analysis of variance. 
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Traditionally, sample covariance matrices Sxx; ^xy and Syy are used 
to replace the corresponding population covariance matrices to solve the 
nonnegative roots pi,p2, , Ppi to the determinant equation 

det(SxySyy Sx^y “ P^ = 0, 

where 

^ n 1 ^ 

Sxx = - - x)(Xi - x)'^, Exy = - - x)(yi “ y)'^, 

™ i=l ^ i=l 

■t n 1 ^ 1 

^yy = -Y.^yi-y)(yi-yf^ ^ = y = -Y.y^- 

2 = 1 2=1 2 = 1 

However, it is inappropriate to use these types of sample covariance matrices 
to replace population covariance matrices to test (1.1) in some cases. We 
demonstrate such an example in Section 6.3. 

Therefore, in this paper we instead consider the nonnegative roots ri, r 2 ,..., 
of an alternative determinant equation as follows: 

(1.4) det(AxyAyy A^y - r^Axx) = 0, 
where 

Axx = -XX^, Ayy = -YY^, Axy = ixY^. 
n n n 

We also call Axx, ^yy and Axy sample covariance matrices, as in the ran¬ 
dom matrix community. However, whichever sample covariance matrices are 
used they are not consistent estimators of population covariance matrices, 
which is called the “curse of dimensionality,” when the dimensions pi and p2 
are both comparable to the sample size n. As a consequence, it is conceivable 
that the classical likelihood ratio statistic (see [20] and [1]) does not work 
well in the high dimensional case (in fact, it is not well defined and we will 
discuss this point in the later section). 

Moreover, from (1.4), when pi <n,p 2 < n, one can see that rf, r|,..., 
are the eigenvalues of the matrix 

(1.5) Sxy = Ax^x-^xyAyy Axy. 

Evidently, A~^ and A”^ do not exist when pi > n and p 2 > n. For this 
reason, we also consider the eigenvalues of the regularized matrix 

(1.6) Txy = Aj^ AxyAyyA^y, 

where A)^^ = (^XX^ + tlpj)“^, t is a positive constant number and Ipj is a 
Pi X Pi identity matrix, and Ayy denotes the Moore-Penrose pseudoinverse 
matrix of Ayy. Hence, Txy is well defined even in the case of pi,P 2 > n. 
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Moreover, Txy reduces to Sxy when pi,P 2 are both smaller than n and 
t = 0. 

We now look at CCA from another perspective. The original random 
vectors x and y can be transformed into new random vectors ^ and rj as 


(1.7) 

such that 

( 1 . 8 ) 


X 

y 





X 

y 


(A' OWSxx 5]xy\ M 0\ /V 
Vo ^yy)\0 B) \V' IpJ 


where V = (Pi,0), Vi = diag( 7 i,... , 7 ^ 1 ) and ^ = Exx'^^Qi, ^ = 5]yy'^^ 
with Qi :pi X Pi and Q 2 :P2 x P 2 being orthogonal matrices satisfying 

Sxi/'SxySyy/" = Q1IPQ2. 


Q2 


Hence, testing independence between x and y is equivalent to testing in¬ 
dependence between ^ and r/. The covariance between ^ and r] has the 
following simple expression 

0 - 

In view of this, if the joint distribution of x and y is Gaussian, independence 
between x and y is equivalent to asserting that the population canonical cor¬ 
relations all vanish: 71 = • • • = 7 pi = 0. Details can be referred to Chapter 11 
of [11]. A natural criteria for this test should be YTi=i ll- 

As pointed out, r* is not a consistent estimator of the corresponding pop¬ 
ulation version 7 j in the high dimensional case. However, fortunately, the 
classical sample canonical correlation coefficients ri, r 2 ,..., or its reg¬ 
ularized analogous still contain important information so that hypothesis 
testing for ( 1 . 1 ) is possible although the classical likelihood ratio statistic 
does not work well in the high dimensional case. This is due to the fact that 
the limits of the empirical spectral distribution (ESD) of ri,...,rpj under 
the null and the alternative could be different so that we may use it to dis¬ 
tinguish dependence from independence (one may see the next section). Our 
approach essentially makes use of the integral of functions with respect to the 
ESD of canonical correlation coefficients. The proposed statistic turns out a 
trace of the corresponding matrices, that is, YTi=i order to apply it to 

conduct tests, we further propose two modified statistics by either dividing 
the total samples into two groups or estimating the population covariance 
matrix of x in a framework of sparsity. 

In addition to proposing a statistic for testing (1.1), another contribu¬ 
tion of this paper is to establish the limit of the ESD of regularized sample 
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canonical correlation coefficients and central limit theorems (CLT) of linear 
functionals of the classical and regularized sample canonical correlation co¬ 
efficients ri, r 2 ,..., Tpj, respectively. This is of an independent interest in its 
own right in addition to providing asymptotic distributions for the proposed 
statistics. 

To derive the CLT for linear spectral statistics of classical and regularized 
sample canonical correlation coefficients, the strategy is to first establish the 
CLT under the Gaussian case, that is, the entries of X are Gaussian dis¬ 
tributed. In the Gaussian case, the CLT for linear spectral statistics of the 
matrix Sxy can be linked to that of an T-matrix, which has been investi¬ 
gated in [22]. We then extend the CLT to general distributions by bounding 
the difference between the characteristic functions of the respective linear 
spectral statistics of Sxy under the Gaussian case and non-Gaussian case. To 
bound such a difference and handle the inverse of a random matrix, we use 
an interpolation approach and a smooth cutoff function. The approach of 
developing the GLT for linear spectral statistics of the matrix Txy is similar 
to that for Sxy, except we first have to develop CLT of perturbed sample 
covariance matrices in the supplement material [23] for establishing CLT of 
the matrix Txy when the entries of X are Gaussian. 

Here, we would point out some works on canonical correlation coefficients 
under the high dimensional scenario. In the high dimensional case, [19] in¬ 
vestigated the limit of the ESD of the classical sample canonical correlation 
coefficients ri, r 2 ,..., and [13] established the Tracy-Widom law of the 
maximum of sample correlation coefficients when Axx and Ayy are Wishart 
matrices and x, y are independent. 

The remainder of the paper is organized as follows. Section 2 proposes a 
new test statistic for (1.1) based on large dimensional random matrix theory 
and contains the main results. Two modified statistics are further provided 
in Section 3 and estimators for some unknown parameters in the asymp¬ 
totic mean and variance for the asymptotic distribution are also proposed. 
Section 4 provides the powers of the test statistics. Two examples as sta¬ 
tistical inference of independence test are explored in Section 5. Simulation 
results for several kinds of dependent structures are provided in Section 6. 
An empirical analysis of cross-sectional dependence of daily stock returns of 
companies from two different sections in New York Stock Exchange (NYSE) 
is investigated by the proposed independence test in Section 7. Some useful 
lemmas and proofs of all theorems and Propositions 4-5 are relegated to 
Appendix A while one theorem about the CLT of a sample covariance ma¬ 
trix plus a perturbation matrix is provided in Appendix B. All appendices 
are given in the supplementary material [23]. 

2. Methodology and theory. Throughout this paper, we make the fol¬ 
lowing assumptions. 
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Assumption 1. pi =pi{n) and p 2 =P 2 {n) with ^ ci and ^ C 2 , 

Cl, C 2 G (0,1), as n ^ oo. 

Assumption 2. pi =pi{n) and p 2 =P 2 {n) with ^ and ^ C 2 , 

c'l G (0, + 00 ) and C 2 G (0, + 00 ), as n —)■ 00 . 

Assumption 3. X = and Y = satisfy X = si4^W 

and Y = where W = (wi,. .. ,w„) = consists of i.i.d. real 

random variables {Wij} with EWn = 0 and E\Wii\’^ = 1; V = (vi,..., v„) = 
(^u)?j’=i consists of i.i.d. real random variables with EVu = 0 and illlViip = 

1; and Eiyy are Hermitian square roots of positive definite matrices 

Sxx and ^yyj respectively, so that (5]44^)^ = Xxx and (s44^)^ = Xyy. 

Assumption 4. ^ proper cumulative distribution function. 


Remark 1. By the dehnition of the matrix Sxy, the classical canonical 
correlation coefficients between x and y are the same as those between w 
and V when w and {wj} are i.i.d., and v and {vj} are i.i.d. 


We now introduce some results from random matrix theory. Denote the 
ESD of any n x n matrix A with real eigenvalues /n < ^*2 < • • • < by 

(2.1) <x}, 

n 

where #{• • •} denotes the cardinality of the set {•••}. 

When the two random vectors x and y are independent and each of them 
consists of i.i.d. Gaussian random variables, under Assumptions 1 and 3, [19] 
proved that the empirical measure of the classical sample canonical correla¬ 
tion coefficients ri, r 2 ,..., converges in probability to a fixed distribution 
whose density is given by 


( 2 . 2 ) pix) 


y/{x- Li){x + Li)(L2 - x){L2 +~^ 

7 rcix(l — x)(l + x) 


X G [^1,^2], 


and atoms size of max(0, (1 — C 2 )/c\) at zero and size max(0,1 — (1 — C 2 )/ci) 
at unity where Li = \y/c 2 — C 2 C 1 — y/ci — C 1 C 2 I and L 2 = |\/c 2 — C 2 C 1 + 
yjci — C 1 C 2 I. Here, the empirical measure of ri,r 2 ,...,is defined as in 
(2.1) with Pi replaced by r*. 

[21] proved that (2.2) also holds for classical sample canonical correlation 
coefficients when the entries of x and y are not necessarily Gaussian dis¬ 
tributed. For easy reference, we state the result in the following proposition. 








REGULARIZED CANONICAL CORRELATION COEFFICIENTS 


7 


Proposition 1. In addition to Assumptions 1 and 3 , suppose that 
{Xij, 1 < i < pi, 1 < j < n} and {Yij, 1 < i < P2A Y j Y n} are independent. 
Then the empirical measure 0/ ri, r2,..., converges almost surely to a 
fixed distribution function whose density is given by ( 2 . 2 ). 


Under Assumptions 2 - 4 , instead of , we analyze the ESD, , of 
the regularized random matrix Txy given in ( 1 . 6 ). To this end, define the 
Stieltjes transform of any distribution function G{x) by 

niG = I - dG{x), 2 € C"*" = {z € C, Srz > 0}, 

J x-z 

where Qz denotes the imaginary part of the complex number z. 

It turns out that the limit of the empirical spectral distribution (LSD) of 
Txy is connected to the LSD of defined below. Let 


Si = - 

P2 


P2 

E 

k=l 


T 


WfcW^ 


S2* = 


1 


n 

- V 


WfcW^ + 1 - 


n 


n-p2 


yi = 


j ’ 


1/2 = 


1 - 4 ' 


In the dehnition of 82*, we require n> p2. The LSD of 824 and its Stieltjes 
transform are denoted by Fy^t and my^fiz), respectively. Under Assump¬ 
tions 2 - 4 , from [ 17 ] and [ 16 ], niy^tiz) is the unique solution in C'*' to 


( 2 . 3 ) 




where mHfiz) denotes the Stieltjes transform of the LSD of the matrix 
(one may also see ( 1 . 4 ) in the supplement material [ 23 ]). Let 
n = (711,712) and y = (7/1,^2) with tii = pi and 712 = ti — p2. The Stieltjes 
transforms of the ESD and LSD of the matrix S1S2-/ are denoted by m^{z) 
and 7iiy(z), respectively, while those of the ESD and LSD of the matrix 
wr82-/ Wi are denoted by m„(z) and niy{z), respectively, where Wi = 
(wi, W2,..., Wpj). Observe that the spectral of 818^/ and W^8^/Wi are 
the same except zero eigenvalues and this leads to 

1 —1/1 

( 2 . 4 ) iny{z) = -^-hyi 77 iy(z). 

We are now in a position to state the LSD of Txy. 


Theorem 2.1. In addition to Assumptions 2 - 4 , suppose that < 

i < Pi, 1 < J < 77 } and {Yij, 1 <i < P2, 1 < j < ti} are independent. 
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(a) If c '2 G (0,1), then the ESD, converges almost surely to a 

fixed distribution where F{\) is a nonrandom dis¬ 

tribution and its Stieltjes transform rny{z) is the unique solution in C"'' to 

("2 5 ) ^ — f 


my{z) = -j 


A(1 -yi- yizmy{z)) - z' 

(b) If c '2 G [ 1 , 00 ), then F'^^^{X), converges almost surely to a fixed distri¬ 
bution — t) where G{\) is a nonrandom distribution and its Stieltjes 

transform satisfies the equation 

dH{X) 


( 2 . 6 ) 


= j 


A(1 — c'^ — c'^zmQ{z)) — z 


Remark 2. Indeed, taking t = 0 in (2.5) recovers [19] ’s result (one may 
refer to the result of F matrix in [5]). 

Let us now introduce the test statistic. Under Assumptions 1 and 3, be¬ 
hind our test statistic is the observation that the limit of F^^y (x) can be ob¬ 
tained from ( 2 . 2 ) when x and y are independent, while the limit of F^^y (x) 
could be different from (2.2) when x and y have correlation. For example, 
if y = Siw and x = S 2 W with pi =P 2 and both Si and S 2 being invertible, 
then 

Sxy = I, 

which implies that the limit of F^^y (x) is a degenerate distribution. This 
suggests that we may make use of F^^y (x) to construct a test statistic. Thus, 
we consider the following statistic: 

r _ 1 PI 

(2.7) 


/ I 

fi(x) dF^^y (x) = — ^ 

i=l 


A perplexing problem is how to choose an appropriate function fi(x). For 
simplicity, we choose (f(x} = x in this work. That is, our statistic is 

Pi 


( 2 . 8 ) 


Sn = I X dF^^y (x) = — 


P^T=1 


Indeed, extensive simulations based on Theorems 2.2 and 2.3 below have 
been conducted to help select an appropriate function 4>{x). We find that 
other functions such as 4>{x) = does not have an advantage over 4>{x) = x. 

In the classical CCA, the maximum likelihood ratio test statistic for (1.1) 
with hxed dimensions is 

pi 


MLR„ = y]log(l 


— T- 


2=1 


(2.9) 
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(see [20] and [1]). That is, (/>(x) in (2.7) takes log(l — x). Note that the 
density p{x) has atom size of max(0,1 — (1 — C 2 )/ci) at unity by (2.2). Thus, 
the normalized statistic MLR„ is not well defined when ci + C 2 > 1 [because 
/log(l — x^)p{x) dx is not meaningful]. In addition, even when ci + C 2 < 1, 
the right end point of p{x), L 2 , can be equal to one so that some sample 
correlation coefficients r* are close to one, for example, L 2 = 1 when ci = 
C 2 = 1/2. This in turns causes a big value of the corresponding log(l — r?). 
Therefore, MLR„ is not stable and this phenomenon is also confirmed by 
our simulations. 

Under Assumptions 2-4, we substitute Txy for Sxy and use the statistic 

( 2 . 10 ) Tn = J xdF"^^y{x). 


We next establish the CLTs of the statistics (2.7) and (2.10). To this end. 


write 


( 2 . 11 ) 

=ffi(+®-(A) - F‘=-’^-(A)) 

and 


( 2 . 12 ) 

=Pi(fT-(A) -F'='i-'^2.(A)) 


where and F^i"’'^ 2 'i(A) are obtained from F^^’^^{X) and 

with Cl, C 2 , c[,C 2 and H replaced by cin = ^, C 2 n = ^, ck = ^ and 

F^^^, respectively; F‘^^’'^^{X) and F'^i’'^ 2 (A) are the limiting spectral distri¬ 
butions of the matrices Sxy and Txy, respectively. The density of F‘^^’‘^^{X) 
can be obtained from p{x) in ( 2 . 2 ) while the density of F‘^^’‘^^(X) can be 
recovered from (2.5). We renormalize (2.7) and (2.10) as 


(A) <P{X)dF^-y{X)- I </.(A)dF'=i-'^2"(A)^, 



(2.14) J </>(A)dGg)p^(A):=pi(^| </-(A) (A) - J </.(A) dF'=i-'^ 2 .(A) 


yi := 7 -^ G ( 0 ,+oo), 2 / 2 := —G ( 0 , 1 ), 

1 - C2 C2 


(2.15) 


h = Vdi + 2/2 -yiy2, ai = 


(i-y 2 ) 2 ’ 


02 = 


(l + /l)2 


9yi,y2i^) ~ 


1 - 2/2 


27rA(yi -L y2X) 


1 / (02 — A) (A — Ol), Ol < A < 02 . 
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Theorem 2.2. Let 4>i,... ,4>s be functions analytic in an open region in 
the complex plane containing the interval [ 01 , 02 ]. In addition to Assumptions 
1 and 3, suppose that 


EWf^ = 3. 


(2.16) 

Then, as re —)• 00 , the random vector 
(2.17) (I <)-i(A)dGWp^(A),...,y‘ </>s(A)dG«p^(A) 

converges weakly to a Gaussian vector {X ^-^,) with mean 


(2.18) 


EXfj,. =lim-;— ® fi 
rii 4m 


l + h^ + 2hV4(0\ 

(1 - ^2)2 J 

1 1 

+ 


C-r-i ^ + r-i C + y2//i, 




and covariance function 
cov{X^.,X^.) 


(2.19) 


= - lim ^ 

r-H 471 = i 


^ fi + h‘^ + 2hm{fi)\ 
( 1 - 2 / 2)2 ) 

l + /i2 + 2/i91(6)\ 


x/: 


(i-y2)2 J, 

I (6 - ^ 6 )^^ df,2-, 

where fi{X) = ^ denotes the real part of a complex number 

and r \ means that r approaches to 1 from above. 

Remark 3. When (/>(x) = x, the mean of the limit distribution in The- 
orem 2.2 is 0 and the variance is These are calculated in Example 

4.2 of [22]. Moreover, assumption (2.16) can be replaced by EY-^i = 3 since 
X and Y have an equal status in the matrix Sxy 

Before stating the CLT of the linear spectral statistics for the matrix Txy, 
we make some notation. Let r be a positive integer and introduce 

dHt{x) , , 1 


mriz) = j 

9{z) = 


{x — Z + '(u{z)Y ’ 
y2{my2t{-my{z))y 


zu{z) = 


l + y2my^tiz) 


{I + y2my^ti-iny{zW 
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s{zi,Z2) = 


h{z) = 


l + y2my^t{zi) I + y2'my^t{z2)'' 
-m^iz) 


1 - yimliz) J{dFy^t{x))/{x + rny(z))'^ ’ 
where {my^t{z))' stands for the derivative with respect to z. 

Theorem 2.3. Let (/)i,..., (/>s he functions analytic in an open region in 
the complex plane containing the support of the LSD F{X) whose Stieltjes 
transform is (2.5). In addition to Assumptions 2-4, suppose that the spectra 
norm of Sxx is bounded and 

(2.20) = 3. 

(a) If C 2 £ (0,1), then the random vector 

( 2 . 21 ) (I </-i(A)dGg,,(A),...,y‘ </.,(A)dGg)p,(A) 

converges weakly to a Gaussian vector (X<^i,..., ) with mean 




2'Ki 


qz 


l + qz 


X ((^1 /+ ^dEy^tix)^ 

yi jIRyizfix + my{z))~‘^ dFy^tix) 


( 2 . 22 ) 


1 


+ h{z)(y2w‘^ {-my{z))m3{-rny{z)) 


+ yl^^i-iny{z))m'y^t{-my{z))m3{-my{z))) 

/(I - y2^^i-]Jlyiz))m2{-my{z))) 
-h{z){y‘^zu^{-my{z))my^t{-my{z))m2i-my{z))) 

/(I - y2'^‘^{-m. {z))m2{-rnJz))) ) dz 


and covariance 
cov{X^.,X^.) 

_ 1 

27r2 


qzi 


CiJc2 \^ + Q^i 


qz2 


i + qz2 
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(2.23) 


X 


m!y{zi)m!y{z2) i 

(my(zi) -my(z2))^ (zi - Z2Y 
_ h{zx)}i{z2) 

(-my(z2) +z?iy(zi))2 

Kzi)h{z2)[l + gjzi) + g{z2) + g{zi)g{z2)\ \ 

[-m^{z2)+ my{zi) + s{-m^{zi),-my{z2))f) 


Here, q is defined in Theorem 2.1. The contours in (2.22) and (2.23) [two in 
(2.23), which may he assumed to he nonoverlapping] are closed and are taken 
in the positive direction in the complex plain, each enclosing the support of 
F{X). 

(b) Ifc 2 G [l,+oo) (p 2 > n), (2.21) converges weakly to a Gaussian vector 
,..., ) with mean 


/ / t ^z \ 4/(I + As( 2 ;)^) ^s{zfidH{X) 

^ fc \l + t~^z) (1-4/ s{zYX^{l +Xsfz))-"^ dH{X)Y 


and 


(2.25) 


cov{X^.,X^.) 




X (j)i 


t ^zi \ 
l + t~^Zl ) 

t~^Z2 
1 + t-^Z2 


s'{zi)s[{z 2 ) 

{s{zi)-s{z2W 


dzi dz2 , 


where sfz) is Stieltjes transform of the LSD of the matrix /W^SxxW. 
The contours in (2.24) o,nd (2.25) [two in (2.25), which may he assumed to 
be nonoverlapping] are closed and are taken in the positive direction in the 
complex plain, each enclosing the support of G{X). 


Here, we would like to point out that the idea of testing independence 
between two random vectors x and y by CCA is based on the fact that the 
uncorrelatedness between x and y is equivalent to independence between 
them when the random vector of size {pi +P 2 ) consisting of the components 
of X and y is a Gaussian random vector. See Wilks [20] and Anderson [1]. For 
non-Gaussian random vectors x and y, uncorrelatedness is not equivalent 
to independence. CCA may fail in this case. Yet, since Theorems 2.2 and 
2.3 hold for non-Gaussian random vectors x and y CCA can be still utilized 
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to capture dependent but uncorrelated x and y such as ARCH type of 
dependence by considering the high power of their entries. See Section 6.5 
for further discussion. 

Following [14], condition (2.16) can be removed. However, it will signifi¬ 
cantly increase the length of this work and we will not pursue it here. 

3. Test statistics. Note that the regularized statistic f \dGj^lp 2 (A) in 
(2.14) [when (/>(A) = A] involves the unknown covariance matrix through 
F'^i"’'^ 2 n(A). In order to apply it to conduct tests, one needs to estimate 
the unknown parameter. It is well known that estimating the population 
covariance matrix Sxx is very challenging unless it is sparse. [ 8 ] and [3] 
proposed some approaches to estimate the limit of the ESD of Sxx or its 
moments. However, the convergence rate is not fast enough to offset the order 
ofpi. Indeed, Theorem 1 of [3] implies that the best possible convergence rate 
is Op(^). In view of this, we provide two methods to deal with the problem. 
One is to estimate f A (A) in a framework of sparsity while the other 

one is to eliminate this unknown parameter by dividing the samples into two 
groups. 


3.1. Plug-in estimator under sparsity. When C 2 < 1, it turns out that 

(3.1) [ AdF"i-'"2"(A) = — - —-^-, 

J Pi Pi 1 + 

where nint is a solution to the equation 

(3.2) mnt = an - — tT{a~^'Sxix + tl)~^ 

Pi 

with On = 1 + (see the proof of Theorem 3.1). An estimator of rrint 

is then proposed as nint which is a solution to the equation 


(3.3) 


^ ^ Oj^I t / /V 1 ^ 

^nt — ^xx “ 1 “ tl) , 

Pi 


with On = 1 + di^rhnt- Here, we use a thresholding estimator Sxx to estimate 
Sxx, slightly different from that proposed by [ 6 ]. Specifically speaking, sup¬ 
pose that the underlying random variables {Xij} are mean zero and variable 
one. Then define Sxx to be a matrix whose diagonal entries are all one and 
the off diagonal entries are > i) with i = and M being 

some appropriate constant (M will be selected by cross-validation). Here 
dij denotes the entry at the (i,j)th position of sample covariance matrix 
^XX^. Therefore, the resulting test statistic is 


Pi 


XdF^-y{\) - 



11 _^_)). 

Pi 1 -L J J 


(3.4) 
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When P 2 >n, it turns out that 

(3.5) j = 1 - 

where run^^ satisfies the equation 

(3.6) = — tr((l - c[^ + + tl)~^. 

Pi 

We then propose the resulting test statistic 

(3.7) (A) - (1 - )^ , 


where thn^^ satisfies the equation 

(3.8) = — tr((l - + il)"^ 

Pi 


Theorem 3.1. In addition to assumptions in Theorem 2.3, suppose that 
EXfj = Ij supjj E\Xij\^'^ < oo for all i and j and that 

(1-9)/2 

—^ 0 , 


log Pi y 


n 


(3.9) So(pi)| 

where \ — 'So(pi) with 0 <q <1. 


(a) Ifc '2 < I, then pi(f AdE'^^y(X) - converges weakly 

to a normal distribution with the mean and variance given in (2.22) and 
(2.23) with (j){\) = A. 

(b) //c^ > 1, then pi(f XdF"^^y{X) — (1 — trhn^^)) converges weakly to a 
normal distribution with the mean and variance given in part (b) of Theorem 
2.3 with (j){X) = X. 


We demonstrate an example of sparse covariance matrices in the simula¬ 
tion parts, satisfying the sparse condition (3.9). 

3.2. Strategy of dividing samples. If (3.9) is not satished, we then pro¬ 
pose a strategy of dividing the total samples into two groups. Specifically 
speaking, we divide the n samples of (x, y) into two groups, respectively, 
that is, 

(3.10) Group 1: = (xi,X2 ,... ,X[„/2]), = (yi,y2 ,... ,y[„/2]) 

and 

Group 2: = (x[„/ 2 ]+i,X[„/ 2 ]+ 2 , • • • ,x„), 

Y^^^ = (y[n/2]+l5y[n/2]+25 • • • lYn), 


(3.11) 




REGULARIZED CANONICAL CORRELATION COEFFICIENTS 


15 


where [n/2] is the largest integer not greater than n/2. When n is odd, we 
discard the last sample. However, if the above strategy of dividing samples 
into two groups is directly used, then the asymptotic means of the resulting 
statistic [the difference between the statistics in (2.14) obtained from two 
subsamples] are always zero in both null hypothesis and alternative hypoth¬ 
esis due to similarity of two groups so that the power of the test statistic 
is very low. This is also confirmed by simulations. Therefore, we further 
propose its modified version as follows. 

For in Group 2, we extract a sub-data that is, 

^ (y [n/2]-1-1) y [n/2]-1-2) • • • ) yn); 

where yj consists of the first [P 2 / 2 ] components of yj, for all j = [n/2] + 1, 
[n/2] -|- 2,..., n. We use Y^^^ to form a new group 

Modified Group 2 : = (x[„/2]+i,X[„/2]+2, • • • ,x„), 

= (y[n/ 2 ]-|-l)y[n/ 2 ]-|- 2 ) • • • ^Yn)- 
For Group 1, it follows from Theorem 2.3 that 

(3.12) JXdpfiF'^^^yiX) - F2<n>2c^n(A)) 4 Zi, 


where Txy is obtained from Txy with X and Y replaced by X^^^ and Y^^^, 
respectively, and Zi is a normal random variable with mean and variance 
given in Theorem 2.3 with c'l and C 2 replaced by 2c'^ and 2c2, respectively, 
and 4>{X) = X. Similarly, with Modihed Group 2, by Theorem 2.3 

(3.13) j Xdpi{F^^y{X) - 4 ^ 2 , 

where Txy is Txy with X and Y replaced by X^^^ and Y^^^ respectively, 
and Z 2 is a normal random variable with the mean and variance given in 
Theorem 2.3 with (j){X) = A and c[ replaced by 2c[. 

We next investigate the relation between 

I AdF2<n:2c^„(A) and J XdF^<r.Ar.(x)^ 


and then calculate some difference between the two statistics in (3.12) and 
(3.13) in order to eliminate the unknown parameters f XdF^^i’^’^‘^2n(^X) and 

f XdFAnA2„(x}. 

When C 2 < 1/2, we have 




P2 

Pi 


P2 1 
P 1 I + 2c[^mnt ’ 


(3.14) 



16 


Y. YANG AND G. PAN 


where fhnt is obtained from mnt satisfying (3.2) with c'l^ replaced by 
On the other hand, 


(3.15) 


/ 


XdF' 


2 ch.c' 


(A) 


P 2/2 

Pi 


P 2/2 1 

Pi 1 + 2c[^mnt' 


It follows that 
(3.16) 


I AdF2"in>2c^n(A) =2 J AdF2^'i-’^2-(A). 


When [P 2 / 2 ] > [n/2], we have 


(3.17) 


I AdF2‘=in.24„(A) = J 






where rfin^^ is satisfying (3.6) with replaced by 2c[^. 

The last case is [P 2 / 2 ] < [?^/2] and C 2 > 1/2. For this case, if we still 
consider Group 1 and Modified Group 2, then 


j= 1 - 

/AdF-i»^=i.(A)=Ma _ Ma_^_. 

J Pi Pi 1 + 2d^^mnt 

From the above formulas, it is difficult to hgure out the relation between 
J A(iF^‘^i"’^'^ 2 n(A) and f AdF^^i'i’'^ 2 n(A) depending on the unknown param¬ 
eter Sxx- To overcome this difficulty, we also apply a “sub-data” trick to 
Group 1. Specihcally speaking, consider a Modified Group 1 as follows. 

Modified Group 1: X^) = (xi,X 2 ,... ,X[„/ 2 ]), Y^) = (y^,y 2 ,...,yj„/ 2 ]), 

where consists of the last [P 2 / 2 ] components of y^, that is, the ith com¬ 
ponent of Yk is the ([P 2 / 2 ] + i)th component of y^, for alH = 1,2 ,..., [P 2 / 2 ] 
and /c = 1,2,..., [n/2]. For Modihed Group 1, by Theorem 2.3, we have 


(3.18) J Xdpi{F^^y{X) - F 2 '=i"’'= 2 "(A)) A Z3, 


where Txy is Txy with X and Y replaced by X^i and Y^i, respectively; 
and Z 3 is a normal random variable with the mean and variance given in 
Theorem 2.3 with fi{X) = X and replaced by 20 ^. Since the unknown 
parameters in (3.13) and (3.18) are the same the difference between (3.13) 
and (3.18) can be taken as the modihed statistic. 

The asymptotic distributions of the three resulting statistics are given in 
Theorem 3.2. 


Theorem 3.2. Suppose that assumptions in Theorem 2.3 hold. 
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(a) If C2 < 1/2, the statistic f AdF^^y(A) — 2f AdF^^y(A) converges 
weakly to a normal distribution with the mean (/ii — 2 ^x 2 ) and variance 

+ where /xi and af are given in (2.22) and (2.23), respectively, with 
c'^,C 2 replaced by 2c4,2c2, respectively, and (j){A) = A; p .2 and are given in 
(2.22) and (2.23), respectively, with replaeed by 2c[ and 4>{A) = A. 

(b) If C2 > 1, the statistic f AdF'^^y (A) — f AdF'^^y (A) converges weakly 
to a normal distribution with the mean zero and variance 2cr|, where (t| is 
given in (2.25) with c'l replaced by 2 c'^ and (/(A) = A. 

(c) // 1/2 < C 2 < 1, the statistic f AdF'^^y (A) — f AdF'^^y (A) converges 
weakly to a normal distribution with mean zero and variance 2(t|, where 

is given in (2.23) with di replaced by 2c'i and 4>{A) = A. 

Remark 4. Unlike using Group 2 of (3.11) although the asymptotic 
means of the statistics in the cases (b) and (c) are zero under the null hypoth¬ 
esis, they are not necessarily equal to zero under the alternative hypothesis 
so that the power of the resulting test statistic becomes much better. 


Theorem 3.2 proposes test statistics which do not involve the unknown 
parameter H, that is, the LSD of the matrix Sxx- However, their asymptotic 
means and asymptotic variances contain some terms involving the unknown 
parameter H. We below provide consistent estimators for such terms appear¬ 
ing in (2.22)-(2.25) in Theorem 2.3, which, together with Slutsky’s theorem 
and the dominant convergence theorem, are enough for applications. 

We use an estimator developed by [ 8 ] for H. For easy reference, we briefly 
state his estimator for H in the following proposition. 


Proposition 2 (Theorem 2 of [8]). In addition to Assumptions 2-4, 
suppose that the spectra norm o/Sxx is bounded. Let Ji, J 2 ,... be a sequence 
of integers tending to 00. Let zq G C"*' and r G M'*' be such that B(zo,r) C 
C"*", where B(zo,r) denotes the closed ball of center zq and radius r. Let 
zi,Z 2 ,... be a sequence of complex variables with a limit point, all contained 
in B{zo,r). Let Hp.^ be the solution of 


(3.19) 


Hr,, = arg min max 
G j<Jr, 


Snin) 


+ Zj — 


Pi 


n 


AdG{A) 
1 + 


where G is a probability measure and Sf^iz) is the Stieltjes transform of the 
BSD of the matrix = ^X^X. Then we have 

J -XX 77, 


Hp.^ ^ H a.s. 


Remark 5. The estimator given in (3.19) is proposed based on 
the Marcenko-Pastur equation which links the Stieltjes transform of the 
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empirical spectral distribution of the sample covariance matrix to an integral 
against the population spectral distribution, that is, the LSD H satisfies the 
Marcenko-Pastur equation 


(3.20) 


1 _ r XdH{X) 
s{z) ^ 1 + As( 2 ;) 


VzgC+, 


where s{z) is the limit of l„(z) = ^ tr(Axx — zln) ^ 


El Karoui [8] developed an algorithm for the estimator Hp^ in (3.19) and 
we state it below. 

(1) A basis pursuit” for measure space. Instead of searching among all 
possible probability measures, the search space is restricted to mixtures of 
certain classes of probability measures in order to deal with (3.19). In other 
words, first select a “dictionary” of probability measures on the real line 
and then decompose the estimator on this dictionary, searching for the best 
coefficients. Hence, the problem can be formulated as 

hnding the best possible weights 

K K 

{u)i,..., wk} with diLpi = Wi dMi, Wi = l,Wi> 0, 

i=l i=l 

where the Mfs are the measures in the dictionary. A “probability measures” 
dictionary is given as follows: 

1. Point masses where is a grid of points. 

2. Probability measures that are uniform on an interval: in this case, 
d^i^\x)=I[aiM]ix)dx/{hi-ai). 

3. Probability measures that have a linearly increasing (or decreasing) den¬ 

sity on an interval [di,hi\ and density 0 elsewhere. So, for the increasing 
case, dMf'\x) = /j.](x) • 2(x — di)/{{hi — di)‘^)dx, and density 0 else¬ 

where. 


(2) A convex optimization problem. Let 

XdG{X) 


1 

IniZj) n , 


1 + Xs^{zj)' 


j — 1) 2, . . . , Jn 


where G(-) has the form of 


Ki Ki K 

G{x) = '^Wi6t^{x) + ^ WidMl^\x)+ ^ WidMf\x), 

i=l i=Ki+l i=K2+i 


with all points tk,k = 1,2,... ,Ki, intervals [ai,bi],i = ATi -|- 1,..., K 2 and 
intervals [dj,hj],j = K 2 + 1,...,K being in the interval [£p^,£i]. Here, 
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and l\ are, respectively, the smallest and largest eigenvalues of the sample 
covariance matrix Axx = Moreover, 1 < Ki < K 2 < K. 

The “translation” of the problem (3.19) into a convex optimization prob¬ 
lem is 

min u 

Vj = l,...,Jn, -u<'Si{ej) <u, 

Vj = l,...,J„, -u<^{ej)<u 

K 

subject to Wi = l and re* > 0, Vi = 1,2,..., iC. 

i=l 

The following proposition provides consistency of the proposed algorithm 
above. 


Proposition 3 (Corollary 1 of [8]). Assume the same assumptions as 
in Proposition 2. Call Hp^ the solution of (3.19), where the optimization 
is now over measures which are sums of atoms, the locations of which are 
restricted to belong to a grid (depending on n) whose step size is going to 0 
as n —>■ 00 . Then 

Hp^ => iV a.s. 

Then the estimator Hp^ derived from the algorithm has the form of 
Ki K2 K 

(3.21) (ii?p^(x) = -L ^ WidM\^\x) + ^ WidMf‘\x), 

i=l i=Ki + l i=i^2 + l 

where {wi,i = l, 2 ,...,iC} is the solution of the convex optimization problem 
above. In practice, we follow the implementation details in Appendix of [ 8 ] 
to derive the estimator iVp^(x). 

Once the estimator (3.21) of the LSD H is available, we are in a position 
to provide estimators for mniz) and my^tiz), the Stieltjes transforms of H 
and the LSD of the matrix 824 , respectively. 


Proposition 4. For any z cCi, the contour specified in Theorem 2.3, 


let 


(3.22) 


Ai . Aa . , 

.. / N Wi Wi hi — z 

mH[z)=y- - -L > 7-log- 

ti — z bi — Oi Ui — z 

i=l i=Ki+l 


K 


+ E 


2wi 


i=K2+l 


{hi - diY 


hi-di + {z- di)log 


hj- z 
di - z 
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where log stands for the corresponding principal branch. The estimator rhy^tiz) 
satisfies the equation 


(3.23) 


'^y2t{z) = rhHt 



1 

I + y2nmy2tiz) 


where muti^) is 


(3.24) 


mufiz) 


^ (1 -C 2 n)^)' 


Then muiz) in (3.22) and my.^t{z) in (3.23) are consistent estimators of 
mniz) and my^t{z), respectively. 


Remark 6 . As to choices of intervals [a^ , 6j] : i = Ki + 1, ATi + 2,..., K 2 ', 
and \hj , dj ]: j = K 2 +1, K 2 + 2,...,R in (3.21), we follow the implementation 
details provided in Appendix of [ 8 ]. Furthermore, choice of {zj.,h.^{zj)) in the 
convex optimization problem, choice of interval to estimate H{-), and choice 
of dictionary are all provided in the Appendix of [ 8 ]. 


With consistent estimators for mH{z) and my^fiz) in Proposition 4, we 
may further provide consistent estimators for all terms appearing in ( 2 . 22 )- 
(2.25). 


Proposition 5. 1. The estimator for my{z) is 

(ttIi) -' - ’ 

where yin = y2n = Qn = -rip- and rm: iz) is the Stieltjes trans- 

^2n ^ ^2n ^ ^2n ^ 

form of the matrix Txy. 

2 . The estimators for w{z) and 5(21, ^2) are, respectively, 

w(z) = - -;- 

I + y2nmy.2t[z) 

s{zi,Z 2 ) I + y 2 nrhy. 2 t{zi) 1 + y2n?hj^2i(^2) ’ 

3 . The estimators for mr{z) with r = 2,3 are 

m2{z) = m-^^{z-w{z)), rhz{z) = \m^H^{z - w{z)), 

respectively, where rh^^^ (z) is the j th derivative of niHt (^) with respect to z 
with j = 1,2. 
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4- The estimators for g{z) and h(z) are 


9{z) = 


(1 + y2n'my^ti-niy{z))y ’ 


h{z) = 


-fhliz) 


■y'- 

— i-yi^ 


1 - yin]Iil(z)m^J^t(-my{z)) 

respectively, where my{z) = + yiniTiyiz). 

5. The estimators for wi = f my(z)x[x + iny{z)]~^ dFy^t{x) and W 2 = 

JE^{z)[x + my{z)\~‘^ dFy^t{x) are 

= ^y{z)m'y^t{-m.y{z)) - \ml{z)m^y^^{-my{z)), 

W2=^y{z)m'yl{-my{z)), 

respectively. 

6. The estimators for vJs = /[H-As^(z)]“^s(z)^A^ (iff(A) andw 4 = f s'^(z) x 
A^[l + As(z)]“^ (iff(A) are 


ZU3 = 




TmH\ -TT 




,( 1 ) 


[ln(^)] 


mw - 


9 '"'H 


kn{z) 


+ 


,( 2 ) 


2 [ln( 2 )] 


mw - 


12 '"'H 


Sn{z) 


ZZ74 — 1 — 


In ( 2 ) 


'mH \ -T 


+ TT- 


kn{z)j [In (2)] 


( 1 ) 

mW \ - 


12 '"'H 


ln(^)/ 


respectively, where 1„(^) is defined as s^(z) = ^ tr(Axx — zln) ^ with = 

iX^X. 

n 

All estimators listed above are consistent for the corresponding unknown 
parameters. 

The proofs of Propositions 4 and 5 are provided in Appendix A of [23]. 

4. The power under local alternatives. This section is to evaluate the 
power of Sn or T„ under a kind of local alternatives. Consider the alternative 
hypothesis 

Hi: X and y are dependent, 

satisfying condition (4.1) below. Draw n samples from such alternatives x 
and y to form the respective analogues of (1.5) and (1.6) and denote them 
by S and T, respectively. Suppose that the underlying random variables 
involved in Sxy,Txy and S,T are in the same probability space (D,P). 


Recall the definitions of Gpl^p 2 ,i = 1,2 in (2.11) and (2.12), and let Rn’ = 


(d 


jXdG. 


(d 


P 1 .P 2 ' 















22 


Y. YANG AND G. PAN 


Theorem 4.1. In addition to assumptions in Theorems 2.2 or 2.3 sup¬ 
pose that for any M > 0 


(4.1) P(|tr(S Sxy )|>M)^1, P(|tr(T Txy )|>M)^1. 
Then 

(4.2) lim P{Rf > or | e^) = 1, 


where and Za^ are, respectively, (1 — a) and a quantiles of the asymp- 

(i) 

totic distribution of the statistic Rf under the null hypothesis. 


Remark 7. For example, one may take S = (XLX^)“^XLY^ x 
(YLY^)“^YLX^ and Sxy = (XX^)“^XPyX^ with L being a random 
matrix and Py = Y^(YY^)“^Y. Particularly, if L = I + ee^ with e = 
x^(l, 1,..., 1) and having finite moment, then under assumptions in The¬ 
orems 2.2 or 2.3 it can be proved that 

tr(S - Sxy) = Op{n) 

satisfying (4.1). 


Next, we evaluate the powers of the modified statistics with the dividing- 
sample method. Draw n samples from alternatives x and y to form the 
respective analogues of T^y, i = 1,2, T^y and denote them by T^*), i = 1,2, 
T(^\ respectively. Let 

jW= J XdF^^'\x)-2 j XdF^^^\x), 

J(2)= I XdF^'"\x)- I XdF^'"\x), 
jf) = j XdF'^^'\x) - j XdF^^"\x). 

Theorem 4.2. In addition to assumptions in Theorem 2.3, suppose that 
for any M > 0, 

P(|tr(TW) - 2tr(T(2)) - (tr(Tg) - 2tr(Tg))| > M) ^ 1, 

if 0^2 <1/2; 

P(|tr(T(i)) - tr(T(2)) - (tr(Tg) - tr(Tg))| > M) ^ 1, 

i/c '2 > 1; 

P(|tr(T(i)) - tr(T(2)) - (tr(f g) - tr(Tg))| > M) ^ 1, 

if 1/2 < c'2 < 1. 


(4.3) 

(4.4) 


(4.5) 
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Then 

lim P (or J(*) < 4) I Hi) = 1, i = 1,2,3, 

n^oo 

where and Za^ are, respectively, (1 — a) and a quantiles of the asymp- 

(i) 

totic distribution of the statistic Jn under the null hypothesis, i = 1,2,3. 


5. Applications of CCA. This section explores some applications of the 
proposed test. We consider two examples from multivariate analysis and 
time series analysis, respectively. 


5.1. Multivariate regression test with CCA. Consider the multivariate 
regression (MR) model as follows: 

(5.1) Y = XB + E, 
where 

Y = [yi,y2,..., ypiJnxpi, ^ = [In, , ^2, . . . , Xp2]nxp2) 

® • • • ! f^pi\p2Xpi ! ^ [®1) ®2, • • • , 6pi]nxpi j 

and each of the vectors yj, Xj, ej, for j = 1,2 ,... ,pi is n x 1 vectors and 
{/3j, z = 1,2 ,... ,pi} are P2 x 1 vectors. 

Let Axy = -X^Y and Axx = -X^X. We have the least square estimate 
of B " 

(5.2) B = A^^ Axy. 

The most common hypothesis testing is to test whether there exists lin¬ 
ear relationship between the two sets of variables (response variables and 
predictor variables) or the overall regression test 

(5.3) Ho:B = 0. 


To test Ho : B = 0, Wilks’ A criterion is 


(5.4) 

where 

(5.5) 
and 

(5.6) 


det(E) 
det(E + H) 


JJ(l + Aj) 

i=l 


E = Y^(I - X(X^X)"^X^)Y 


H = B'^(X^X)B; 


and {Aj: i = 1,..., s} are the roots of det(H — AE) = 0, s = mm(A:,p). An 
alternative form for A is to employ sample covariance matrices. That is, H = 
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AyxA"x Axy and E = Ayy — AyxA"x Axy , so that det(H — AE) = 0 becomes 
det(AyxA"^Axy — A(Ayy — AyxA~^Axy)) = 0. From Theorem 2.6.8 of [18], 
we have det(H — 0(H + E)) = det(AyxA~xAxy — ^Ayy) = 0 so that 

(5.7) A ^ Hd + A.)-‘ = n(l - 0.) = 

7=1 7 = 1 ' 

Evidently, the quantities r? = 0^, i = 1,..., s are sample canonical correlation 
coefficients. Therefore, the test statistic (5.4) can be rewritten as 

s 

(5.8) logA = ^log(l-r2). 

From this point of view, the multiple regression test is equivalent to the 
independence test based on canonical correlation coefficients. As stated in 
the last section, the statistic log A is not stable in the high dimensional cases. 
Hence, our test statistic Sn or can be applied in the MR test. 

5.2. Testing for cointegration with CCA. Consider an n-dimensional vec¬ 
tor process {yt} that has a first-order error correction representation 

(5.9) Ayt =-a/3'yt-i + £*, t = l,...,r, 

where a and f3 are full rank n x r matrices (r < n) and the n-dimensional 
innovation {e^} is i.i.d. with zero mean and positive covariance matrix fi. 
Select cx and /3 so that the fact that jl^ — (R — cxl3')z\ = 0 implies that either 
\z\ > 1 or z = 1 and that is of full rank, where cx± and /3_|_ are full rank 

n X (n — r) matrices orthogonal to a and f3. Under these assumptions, {yt} 
is 1(1) with r cointegration relations among its elements; that is, {P'yt} is 
/(O). Here, I{d) denotes integrated of order d. 

The goal is to test 

(5.10) Ho :r = 0 (q =/3 = 0); against Hi:r>0; 

that is, whether there exists cointegration relationships among the elements 
of the time series {yt}. 

This cointegration test is equivalent to testing: 

Ho : Ayt is independent with Ayt_i; against 
Hi: Ayt is dependent with Ayt_i. 

In order to apply canonical correlation coefficients to cointegration test 

(5.10) , we construct random matrices 

(5.11) X = (Ay2, Ay4 ,..., Ay 2 t_ 2 , Ay 2 t,..., Ayr), 

(5.12) Y = (Ayi,Ay3,...,Ay2t_i,Ay2t+i,...,Ayr-.i). 
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6. Simulation results. This section reports some simulated examples to 
show the finite sample performance of the proposed test. 


6.1. Empirical sizes and empirical powers. First, we introduce the method 
of calculating empirical sizes and empirical powers. Let z\-a be the 100(1 — 
a)% quantile of the asymptotic null distribution of the test statistic Sn- 
With K replications of the data set simulated under the null hypothesis, we 
calculate the empirical size as 


( 6 . 1 ) 


a = 


{# of > Zl-^} 
K 


where represents the values of the test statistic Sn based on the data 
simulated under the null hypothesis. 

The empirical power is calculated as 

( 6 . 2 ) 

K 

where represents the values of the test statistic Sn based on the data 
simulated under the alternative hypothesis. 

In our simulations, we choose K = 1000 as the number of repeated simu¬ 
lations. The significance level is a = 0.05. 


6.2. Testing independence. Consider the data generating process 


(6.3) 



’^1/2 -^1/2 

X = 5Jxx W, y = Sy'y V, 

with 


(a) 

^xx = Ipi) ^yy ~ i 



(b) 

^xx = {(^kh ^yy ~ ^P2) 



(c) 

= {(^k^)^]h=i, ^yy = ^-P2; 



(d) 

^xx = B^COv(ft)B-|-Su, ^yy 

where 

fl, 

k 

= h-. 

II 

0, 

k 

= 1; h = 2,3,..., or h= T, k 


lo, 

others 

with 6 = 0.2 and 





^kh — 


1 - 


, 2 ’ 


k,h = 1,2,... ,pi,(j) = 0.8. 




V3i. 


Here, B = -^^(bi, b2 ,..., bp^) is a deterministic matrix. In the simulation, 
each bj: r X 1 is generated independently from a normal distribution with 
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Table 1 

Empirical sizes of the proposed test Sn and the renormalized likelihood ratio test MLR„ 
at 0.05 significance level for DGP (a) and DGP (b) 


{pi,P2,n) 

Sn DGP (a) 

Sr, DGP (b) 

MLR„ DGP (a) 

MLR„ DGP (b) 

(10,20,40) 

0.0458 

0.0461 

0.0481 

0.0490 

(20,30,60) 

0.0480 

0.0488 

0.0440 

0.0448 

(30,60,120) 

0.0475 

0.0480 

0.0530 

0.0520 

(40,80,160) 

0.0464 

0.0466 

0.0420 

0.0420 

(50,100,200) 

0.0503 

0.0504 

0.0487 

0.0500 

(60,120,240) 

0.0490 

0.0490 

0.0574 

0.0572 

(70,140,280) 

0.0524 

0.0520 

0.0570 

0.0582 

(80,160,320) 

0.0500 

0.0500 

0.0632 

0.0583 

(90,180,360) 

0.0521 

0.0511 

0.0559 

0.0580 

(100,200,400) 

0.0501 

0.0503 

0.0482 

0.0589 

(110,220,440) 

0.0504 

0.0500 

0.0440 

0.0590 

(120,240,480) 

0.0513 

0.0511 

0.0400 

0.0432 

(130,260,520) 

0.0511 

0.0511 

0.0520 

0.0560 

(140,280,560) 

0.0469 

0.0474 

0.0582 

0.0580 

(150,300,600) 

0.0495 

0.0500 

0.0590 

0.0593 

(160,320,640) 

0.0514 

0.0517 

0.0437 

0.0559 

(170,340,680) 

0.0498 

0.0500 

0.0428 

0.0430 

(180,360,720) 

0.0509 

0.0510 

0.0580 

0.0577 

(190,380,760) 

0.0488 

0.0485 

0.0388 

0.0499 

(200,400,800) 

0.0491 

0.0491 

0.0462 

0.0499 

(210,420,840) 

0.0491 

0.0500 

0.0450 

0.0555 

(220,440,880) 

0.0515 

0.0510 

0.0572 

0.0588 

(230,460,920) 

0.0493 

0.0498 

0.0470 

0.0488 

(240,480,960) 

0.0482 

0.0479 

0.0521 

0.0561 

(250,500,1000) 

0.0452 

0.0450 

0.0527 

0.0545 


covariance matrix being an r x r identity matrix and mean consisting 
of all 1. cov(ft) is also an r x r identity matrix and Su is a x pi identity 
matrix. 

The empirical sizes of the proposed statistics Sn for data generating pro¬ 
cesses (DGPs) (a) and (b) are listed in Table 1. Moreover, the empirical 
sizes for the renormalized statistic MLR„ are included as comparison with 
Sn- Here the renormalized statistic MLR^ means the statistic 

Pij log(l - A)d(F®"^(A) 

The empirical sizes of Tn for DGPs (a)-(d) are listed in Table 2. For DGP (a), 
we use the original statistic T„; for DGP (b), the statistic in Theorem 3.1 is 
used; for DGPs (c) and (d), the dividing-sample statistic in Theorem 3.2 is 
utilized. For Theorem 3.2, we follow the implementation details in Appendix 
of [8] to estimate the LSD H for the matrix Sxx- 
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Table 2 

Empirical sizes of the proposed test T„ at 0.05 significance level for DGP {af-DGP (d) 


{pi,P2,n) 

Tn DGP (a) 

Tn DGP (b) 

Tn DGP (c) 

Tn DGP (d) 

(100,50,80) 

0.0569 

0.0462 

0.0642 

0.0602 

(140,70,120) 

0.0573 

0.0429 

0.0619 

0.0600 

(180,90,150) 

0.0577 

0.0452 

0.0623 

0.0583 

(200,100,170) 

0.0552 

0.0429 

0.0594 

0.0592 

(240,120,180) 

0.0581 

0.0510 

0.0602 

0.0608 

(280,140,250) 

0.0571 

0.0483 

0.0592 

0.0584 

(320,160,270) 

0.0521 

0.0479 

0.0603 

0.0549 

(360,180,290) 

0.0529 

0.0489 

0.0574 

0.0569 

(400,190,300) 

0.0542 

0.0522 

0.0589 

0.0579 

(440,220,330) 

0.0557 

0.0529 

0.0542 

0.0581 

(480,240,350) 

0.0531 

0.0562 

0.0579 

0.0569 


The parameter t in the statistic Tn takes a value of 40. For DGP (a), we use the original 
statistic Tn in Theorem 2.3; for DGP (b), the statistic in Theorem 3.1 is used; for DGPs 
(c) and (d), the dividing-sample statistic in Theorem 3.2 is utilized. 


From the results in Tables 1 and 2, the proposed statistics Sn and Tn 
work well under Assumptions 1 and 2 , respectively. 

6.3. Factor model dependence. We consider the factor model as follows: 
(6.4) xt = Aift-hut, yt = A 2 it + vt,t = l,2,... ,n, 

where Ai = ..., and A 2 = ^(A^\ A^^\ ..., are 

Pi X r and p 2 x r deterministic matrices, respectively. In the simulation, all 
the components of Xj“' :k = 1,2,... ,r',j = 1,2 are generated from a normal 
distribution with mean being 0.8 and variance being 1 . t = 1 , 2 ,..., re are 
r X 1 random vectors with i.i.d. standard Gaussian distributed elements and 
Ui and vt, t = 1, 2 ,..., re are independent random vectors whose elements are 
all standard Gaussian distributed. 

For this model, and yt are not independent if r 7 ^ 0. The proposed 
test statistic and Tn can be used to detect this dependent structure. 
Tables 3 and 4 illustrate the powers of the proposed statistics and Tn, 
respectively, as r increases from 3 to 10. For Tn, we use its modified version 
in Theorem 3.2. Results in these tables indicate that for one triple {pi,P 2 , n), 
the power increases as the number of factors r increases. This phenomenon 
makes sense since the dependence between x^ and yt is described by the 
r common factors contained in the factor vector f^. Stronger dependence 
between x^ and yt exists while more common factors are included in the 
model. 

Here, we would like to point out that using GGA based on the sample 
covariance matrices with sample mean will incorrectly conclude that x^ and 
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Table 3 

Empirical powers of the proposed test Sn at 0.05 significance level for factor models 


{pi,P2,n) 

r = 3 

r = 5 

r = 7 

r = 10 

(10,20,40) 

0.3750 

0.5200 

0.5910 

0.9320 

(30,60,120) 

0.3070 

0.6240 

0.8450 

0.9330 

(50,100,200) 

0.3090 

0.6700 

0.7980 

0.9700 

(70,140,280) 

0.3520 

0.6470 

0.8330 

0.9850 

(90,180,360) 

0.3670 

0.6720 

0.8230 

0.9880 

(110,220,440) 

0.3570 

0.6690 

0.8490 

0.9850 

(130,260,520) 

0.3440 

0.6390 

0.8510 

0.9960 

(150,300,600) 

0.3780 

0.6440 

0.8370 

0.9990 

(170,340,680) 

0.3580 

0.6580 

0.8590 

1.0000 

(190,380,760) 

0.3490 

0.6620 

0.8720 

1.0000 

(210,420,840) 

0.3460 

0.6790 

0.8890 

1.0000 

(230,460,920) 

0.3800 

0.6930 

0.8770 

1.0000 

(250,500,1000) 

0.3470 

0.6890 

0.8940 

1.0000 

The powers are 

under the alternative hypothesis that 

X and y satisfy the factor model 

(6.4). r is the number of factors. 




yt can be independent even if r > 0 but ft = f independent of t because CCA 

of xt and yt 

is the same as that of 

Ui and Vi. 

This is why (1.4) and (1.6) 

are used. 





6.4. Uncorrelated but dependent. 

The construction of (2.8) 

is based on 

the idea that the limit of could not be determined from (2.2) when 

X and y have correlation. Thus, a natural question is whether our statistic 


Table 4 



Empirical powers of the proposed test Tr. 

1 at 0.05 significance level for factor models 

{pi,P2,n) 

r = 3 

r = 5 

r = 7 

r = 10 

(100,50,80) 

0.3680 

0.6380 

0.7330 

0.9470 

(140,70,120) 

0.3380 

0.6440 

0.8690 

0.9520 

(180,90,150) 

0.3290 

0.6190 

0.8890 

0.9740 

(200,100,170) 

0.3410 

0.6270 

0.8920 

0.9820 

(240,120,180) 

0.3340 

0.6290 

0.8840 

0.9790 

(280,140,250) 

0.3570 

0.6480 

0.8730 

0.9870 

(320,160,270) 

0.3490 

0.7120 

0.8890 

0.9940 

(360,180,290) 

0.3690 

0.6890 

0.8930 

0.9920 

(400,200,310) 

0.3830 

0.7080 

0.9030 

0.9980 

(440,220,330) 

0.3920 

0.7040 

0.8930 

1.0000 

(480,240,350) 

0.3970 

0.6990 

0.9110 

1.0000 


The powers are under the alternative hypothesis that x and y satisfy the factor model 
(6.4). r is the number of factors. The parameter t in the statistic Tn takes value of 40. For 
Tn, we use its modified dividing-sample version in Theorem 3.2. 
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Table 5 

Empirical powers of the proposed test Sn at 0.05 
significance level for uncorrelated but dependent case 


{pi,P2,n) 

a; = 4 

oj = 10 

(10,20,40) 

0.8140 

0.9690 

(30,60,120) 

0.8200 

0.9510 

(50,100,200) 

0.8220 

0.9600 

(70,140,280) 

0.8100 

0.9610 

(90,180,360) 

0.8210 

0.9640 

(110,220,440) 

0.8110 

0.9670 

(130,260,520) 

0.8320 

0.9740 

(150,300,600) 

0.8420 

0.9740 

(170,340,680) 

0.8450 

0.9760 

(190,380,760) 

0.8580 

0.9680 

(210,420,840) 

0.8420 

0.9670 

(230,460,920) 

0.8440 

0.9810 

(250,500,1000) 

0.8620 

0.9810 


The powers are under the alternative hypothesis that 
Yit = Xit - EXit,i = 1,2,... ,pi and Yjt = ejt,j =pi + 
1,... ,p 2 \ t = l,...,n, where ejt,j = pi + 1,... ,P 2 ; t = 
1,... ,n are standard normal distributed and indepen¬ 
dent with Xit and w = 4,10. 


works in the uncorrelated but dependent case. Below is such an example to 
demonstrate the power of the test statistic in detecting uncorrelatedness. 

Let xt = {Xit,X 2 ti • • •, Xp^t)^, t = 1,2,..., n be i.i.d. normally distributed 
random vectors with zero means and unit variances. Define yt = (Lit, Y 2 t, ■ ■ ■, 
Yp^tf, t = 1 ,2 ,..., n by Yu = {Xf^ - EXf^^),i = 1,2,..., min(pi,p 2 ) and 
if Pi < P 2 , we let Yjt = EjtJ = pi + 1,... ,p 2 ]t = 1,... ,re, where ejtJ = 
Pi -|- 1 ,... ,p 2 ] t = 1,... ,n are i.i.d. normal distributed random variables and 
independent with and k is an positive integer. 


Remark 8 . For standard normal random variable Xu, the 2A:th moment 


is EXl^ = 2 -^=^. 


For this model, x* and yt are uncorrelated since cov{Xu, Yu) = EX^^~^^ — 
EXuEXf^ = 0. Simulation results in Tables 5 and 6 provide the empirical 
powers of Sn and T„ by taking k = 2 and k = 5, respectively. They show 
that Sn and Tn can distinguish this kind of dependent relationship well 
when k = 5. For the statistic Tn, since the covariance matrix of x is an 
identity matrix, we use the original statistic Tn in Theorem 2.3. 


6.5. ARCH type dependence. The statistic works in the above example 
because the limit of E^^y cannot be determined from ( 2 . 2 ) if x and y are 
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Table 6 

Empirical powers of the proposed test Tn at 0.05 
significance level for uncorrelated but dependent case 


{pi,P2,n) 

LJ = 4 

oj = 10 

(100,50,80) 

0.7240 

0.8690 

(140,70,120) 

0.7940 

0.8890 

(180,90,150) 

0.7830 

0.8940 

(200,100,170) 

0.7910 

0.9340 

(240,120,180) 

0.8420 

0.9290 

(280,140,250) 

0.8680 

0.9580 

(320,160,270) 

0.9010 

0.9820 

(360,180,290) 

0.9190 

0.9940 

(400,200,310) 

0.9530 

0.9990 

(440,220,330) 

0.9820 

1.0000 

(480,240,350) 

0.9940 

1.0000 


The powers are under the alternative hypothesis that 
Yit = Xft - EXit,i = 1,2,... ,pi and Yjt = ejtJ =pi + 

1.. ..,P2\ t = where £jt,j =pi + l,...,p 2 \ t = 

1.. .. ,n are standard normal distributed and indepen¬ 
dent with Xit and tu = 4,10. The parameter t in the 
statistic Tn takes value of 40. The original statistic Tn 
in Theorem 2.3 is used. 


uncorrelated. However, the limit of (x) might be the same as (2.2) when 
X and y are uncorrelated. We consider such an example as follows. 

Consider two random vectors xj = {Xit,X 2 t-, ■ ■ ■ iXp^t) and yt = (hit,l2T 
..., Yp^t) as follows: 

(6.5) Yit = Zit\J ao + alX];^, i = 1,2,...,min(pi,p2); 

(6.6) ifpi<p2 , Yjt = Zjt, j =pi + l,...,p2, 

where zt = {Zu, Z 2 t, • • •, Zp^t) is a random vector consisting of i.i.d. elements 
generated from Normal (0,1) and {zt :t = 1,... ,n} are independent across 
t; xt = {Xit,X 2 t, ■ ■ ■ is also a random vector with i.i.d. elements gen¬ 

erated from Normal (0,1) and {xt:t = l,...,n} are independent across t. 
Moreover, {zt :t = 1,... ,n} are independent of {x^ :t = 1,... ,n}. 

For this model, xt and yt are dependent but uncorrelated. Simulation 
results indicate that the proposed test statistic Sn cannot detect the depen¬ 
dence between them. Nevertheless, if we substitute the elements Xf^ and 
Y^ for Xit and Yjt, respectively, in the matrix Sxy, then the new resulting 
statistic Sn can capture the dependence of this type. This efficiency is due 
to the correlation between the high powers of X^ and Y^. 

Tables 7 and 8 list the powers of the proposed statistics Sn and Tn for 
testing model (6.5) in several cases, that is, ap and ai take different values. 
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Table 7 

Empirical powers of the proposed test Sn at 0.05 significance level for x and y with 

ARCH(l) dependent type 


{pi,P2,n) 

(0.9, 0.1) 

(0.8,0.2) 

(0.7,0.3) 

(0.6,0.4) 

(0.5,0.5) 

(10,20,40) 

0.3480 

0.4670 

0.6380 

0.7650 

0.8500 

(30,60,120) 

0.4840 

0.8090 

0.9820 

0.9990 

1.0000 

(50,100,200) 

0.6190 

0.9730 

1.0000 

1.0000 

1.0000 

(70,140,280) 

0.7020 

0.9980 

1.0000 

1.0000 

1.0000 

(90,180,360) 

0.7900 

1.0000 

1.0000 

1.0000 

1.0000 

(110,220,440) 

0.8620 

1.0000 

1.0000 

1.0000 

1.0000 

(130,260,520) 

0.8970 

1.0000 

1.0000 

1.0000 

1.0000 

(150,300,600) 

0.9440 

1.0000 

1.0000 

1.0000 

1.0000 

(170,340,680) 

0.9520 

1.0000 

1.0000 

1.0000 

1.0000 

(190,380,760) 

0.9810 

1.0000 

1.0000 

1.0000 

1.0000 

(210,420,840) 

0.9880 

1.0000 

1.0000 

1.0000 

1.0000 

(230,460,920) 

0.9950 

1.0000 

1.0000 

1.0000 

1.0000 

(250,500,1000) 

0.9980 

1.0000 

1.0000 

1.0000 

1.0000 


The powers are under the alternative hypothesis that Yu = Zu ao + caiXf ^, i = 
1,2,... ,pi; Yjt = Zjt,j = Pi + 1,... ,P 2 . The pair of two numbers in this table is the value 
of (Qo,ai). 


For the statistic T^, since the covariance matrix of x is an identity matrix, we 
use the original statistic T„ in Theorem 2.3. Prom the table, we can find the 
phenomenon that as ai increases, the powers also increase. This is consistent 
with our intuition because larger ai brings about larger correlation between 
Yit and Xu. 


7. Empirical applications. As an application of the proposed indepen¬ 
dence test, we test the cross-sectional dependence of daily stock returns of 
companies between two different sections from New York Stock Exchange 
(NYSE) during the period 2000.1.1-2002.1.1, including consumer service 
section, consumer duration section, consumer nonduration section, energy 
section, finance section, transport section, healthcare section, capital goods 
section, basic industry section and public utility section. The data set is 
obtained from Wharton Research Data Services (WRDS) database. 

We randomly choose pi and p 2 companies from two different sections, re¬ 
spectively, such as the transport and finance section. At each time t, denote 
the closed stock prices of these companies from the two different sections by 


xt = {xu,X 2 t, • • •, Xp^tf and y* = (yu, y 2 t, • • •, yp 2 t)'^, respectively. We con¬ 
sider daily stock returns rf = and rf = 

with rfi = log i = 1,2,... ,pi and = log j = 1,2,... ,p 2 . The 

goal is to test the dependence between rf and rf. 
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Table 8 

Empirical powers of the proposed test Tn at 0.05 significance level for x and y with 

ARCH(l) dependent type 


{pi,P2,n) 

(0.9,0.1) 

(0.8,0.2) 

(0.7,0.3) 

(0.6,0.4) 

(0.5,0.5) 

(100,50,80) 

0.6020 

0.6180 

0.7270 

0.8930 

0.9660 

(140,70,120) 

0.6370 

0.7890 

0.8020 

0.8990 

0.9820 

(180,90,150) 

0.7490 

0.8280 

0.9090 

0.9920 

1.0000 

(200,100,170) 

0.8130 

0.8730 

0.9930 

1.0000 

1.0000 

(240,120,180) 

0.8920 

0.9720 

0.9950 

1.0000 

1.0000 

(280,140,250) 

0.9470 

0.9870 

1.0000 

1.0000 

1.0000 

(320,160,270) 

0.9900 

0.9980 

1.0000 

1.0000 

1.0000 

(360,180,290) 

0.9910 

0.9940 

1.0000 

1.0000 

1.0000 

(400,200,310) 

0.9890 

0.9950 

1.0000 

1.0000 

1.0000 

(440,220,330) 

0.9920 

1.0000 

1.0000 

1.0000 

1.0000 

(480,240,350) 

0.9980 

0.9970 

1.0000 

1.0000 

1.0000 


The powers are under the alternative hypothesis that Yu = Zit\/ao~+~arX^,i = 
1,2,... ,pi;Yjt = Zjt,j =pi + 1,... ,P 2 . The pair of two numbers in this table is the value 
of (q:o,Q:i). The parameter t in the statistic takes value of 40. The original statistic Tn 
in Theorem 2.3 is used. 


The proposed test Sn is applied to testing dependence of rf and rf. For 
each (pi,p 2 ,n), we randomly choose pi and p 2 companies from two different 
sections, construct the corresponding sample matrices X = (rj,r 2 ,... ,rpj 
and Y = (r^, ,..., rp2 ), and then calculate the P-value by applying the 

proposed test. Repeat this procedure 100 times and derive 100 P-values to 
see whether the cross-sectional “dependence” feature is popular between the 
tested two sections. 

We test independence of daily stock returns of companies from three pairs 
of sections, that is, basic industry section and capital goods section, pub¬ 
lic utility section and capital goods section, finance section and healthcare 
section. From Tables 9, 10 and 11, we can see that, as the pair of numbers 
of companies {pi,P 2 ) increases, more experiments are rejected in terms of 
the P-values below 0.05. It shows that cross-sectional dependence exists and 
is popular for different sections in NYSE. This suggests that the assump¬ 
tion that cross-sectional independence in such empirical studies may not be 
appropriate. 

8 . Acknowledgement. The authors would like to thank the Editor, an 
Associate Editor and the referees for their constructive comments and sug¬ 
gestions which significantly improved this paper. 
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Table 9 

P-values for (pi,p 2 ) companies from basic industry section 
and capital goods section of NYSE 


No. of exp. 


P-values: 

{pi,P2,n) 

(Pi,P 2 ,n) 

P-value interval 

(10,15,20) 

(15,20,25) 

[0,0.05] 

56 

60 

[0.05,0.1] 

22 

20 

[0.1,0.2] 

9 

12 

[0.2,0.3] 

2 

5 

[0.3,0.4] 

10 

0 

[0.4,0.5] 

1 

3 

[0.6,0.7] 

0 

0 

[0.8,0.9] 

0 

0 

[0.9,1] 

0 

0 

These are P-values for (p\,P 2 ) companies from different two 
sections of NYSE: basic industry section and capital goods 
section, each of which has n daily stock returns during the 

period 2000.1.1-2002.1.1 

. The number of repeated experi- 

ments is 100. All the closed stock prices are from WRDS 

database. No. of Exp. is 

the number of experiments whose 

P-values are in the corresponding interval. 



Table 10 


P-values for {pi,p 2 ) companies from public utility section 

and capital goods section of NYSE 



No. of exp 


P-values: 

{pi,P2,n) 

{Pi,P2,n) 

P-value interval 

(10,15,20) 

(15,20,25) 

[0,0.05] 

76 

84 

[0.05,0.1] 

10 

12 

[0.1,0.2] 

4 

2 

[0.2,0.3] 

7 

1 

[0.3,0.4] 

0 

1 

[0.4,0.5] 

2 

0 

[0.6,0.7] 

1 

0 

[0.8,0.9] 

0 

0 

[0.9,1] 

0 

0 


These are P-values for (p\,P 2 ) companies from different two 
sections of NYSE: public utility section and capital goods 
section, each of which has n daily stock returns during the 
period 2000.1.1"2002.1.1. The number of repeated experi¬ 
ments is 100. All the closed stock prices are from WRDS 
database. No. of Exp. is the number of experiments whose 
P-values are in the corresponding interval. 
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Table 11 

P-values for (pi,p 2 ) companies from finance section and 
healthcare section of NYSE 


P-values: 

P-value interval 

No. of 

exp. 

{pi,P2,n) 

(10,15,20) 

(Pi,P2,n) 

(15,20,25) 

[0,0.05] 

90 

92 

[0.05,0.1] 

4 

5 

[0.1,0.2] 

5 

1 

[0.2,0.3] 

1 

2 

[0.3,0.4] 

0 

0 

[0.4,0.5] 

0 

0 

[0.6,0.7] 

0 

0 

[0.8,0.9] 

0 

0 

[0.9,1] 

0 

0 


These are P-values for {pi,p 2 ) companies from different two 
sections of NYSE: finance section and healthcare section, 
each of which has n daily stock returns during the period 
2000.1.1-2002.1.1. The number of repeated experiments is 
100. All the closed stock prices are from WRDS database. 
No. of Exp. is the number of experiments whose P-values are 
in the corresponding interval. 


SUPPLEMENTARY MATERIAL 

Supplement to “Independence test for high dimensional data based on 
regularized canonical correlation coefficients” 

(DOL 10.1214/14-AOS1284SUPP; .pdf). The supplementary material is di¬ 
vided into Appendices A and B. Some useful lemmas, and proofs of all 
theorems and Proposition 4-5 are given in Appendix A while one theorem 
related to CLT of a sample covariance matrix plus a perturbation matrix is 
provided in Appendix B. 
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